Picture for Rita Cucchiara

Rita Cucchiara

Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation

Add code
Apr 18, 2025
Viaarxiv icon

Diffusion Transformers for Tabular Data Time Series Generation

Add code
Apr 10, 2025
Viaarxiv icon

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning

Add code
Mar 19, 2025
Viaarxiv icon

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives

Add code
Mar 18, 2025
Viaarxiv icon

Hyperbolic Safety-Aware Vision-Language Models

Add code
Mar 15, 2025
Viaarxiv icon

DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection

Add code
Mar 12, 2025
Viaarxiv icon

ToFu: Visual Tokens Reduction via Fusion for Multi-modal, Multi-patch, Multi-image Task

Add code
Mar 06, 2025
Viaarxiv icon

Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval

Add code
Mar 03, 2025
Viaarxiv icon

Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries

Add code
Dec 26, 2024
Viaarxiv icon

Causal Graphical Models for Vision-Language Compositional Understanding

Add code
Dec 12, 2024
Viaarxiv icon